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ABSTRACT 

We present the Mock Map Facility, a powerful tool to convert theoretical outputs 
of hierarchical galaxy formation models into catalogues of virtual observations. The 
general principle is straightforward : mock observing cones can be generated using 
semi-analytically post-processed snapshots of cosmological N-body simulations. These 
cones can then be projected to synthesise mock sky images. To this end, the paper 
describes in detail an efficient technique to create such mock cones and images from 
the GALICS semi-analytic model, providing the reader with an accurate quantification 
of the artifacts it introduces at every step. We show that replication effects introduce a 
negative bias on the clustering signal - typically peaking at less than 10 percent around 
the correlation length. We also thoroughly discuss how the clustering signal is affected 
by finite volume effects, and show that it vanishes at scales larger than about a tenth of 
the simulation box size. For the purpose of analysing our method, we show that number 
counts and redshift distributions obtained with GALICS/momaf compare well to K- 
band observations and to the 2dFGRS. Given finite volume effects, we also show that 
the model can reproduce the APM angular correlation function. The momaf results 
discussed here are made publicly available to the astronomical community through 
a public database. Moreover, a user-friendly Web interface (http://galics.iap.fr) 
allows any user to recover her/his own favourite galaxy samples through simple SQL 
queries. The flexibility of this tool should permit a variety of uses ranging from exten- 
sive comparisons between real observations and those predicted by hierarchical models 
of galaxy formation, to the preparation of observing strategies for deep surveys and 
tests of data processing pipelines. 

Key words: astronomical data bases:miscellaneous - galaxies:statistics - large-scale 
structure of Universe - methods: numerical 



1 INTRODUCTION 

Large galaxy surveys in which homogeneous datasets are ac- 
quired and analysed stand to benefit from the availability of 
mock images of galaxy wide/deep fields. These images are 
very useful to design observational strategies, and to put 
predictions of various models into a format that can be di- 
rectly compared to actual observations. Such an activity of 
"sky simulation" is now running extensively as part of the 
data processing centres for satellite missions and for the next 
generation of ground-based instruments, because it is now 
acknowledged that data processing and mock observations 
have to be integrated into instrument building from the be- 
ginning of the project to the interpretation of the data. How- 
ever, such sky simulations are not easy, because they have to 



meet a certain level of realism to be useful. The purpose of 
this paper is to demonstrate how realistic galaxy catalogues 
and mock images can be synthesised from the outputs of a 
model of hierarchical galaxy formation. 

Of course, mock galaxy surveys can be generated simply 
by drawing galaxy types, luminosities and sizes from their 
distribution functions. However, such an approach does not 
meet the requirements, because (i) evolution cannot be in- 
cluded except in a very crude way, especially if we are to 
mimic number evolution, (ii) multi-wavelength surveys can- 
not be addressed easily, and (iii) spatial information cannot 
be addressed since galaxy positions are only known with 
a Poissonian distribution. Anothe r approach is to use ob- 
served fields rescaled on-purpose feouwens et al.lll998l) . A 
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third approach is to start from theoretical priors and use 
numerical simulations to describe hierarchical clustering. In 
this approach, the main issue is how to transform mass to 
light and implement a robust method to get galaxies within 
dark matter structures. 

There are basically two paths that this theoretical ap- 
proach can take : (i) biasing schemes, and (ii) halo mod- 
els. The simplest approach (i) consists of "painting" galax- 
ies on dark matter (DM) simulations using phenomeno- 
logical prescriptions to pick DM particles to be galax- 
ies. Such methods, based on the line ar bias form alism 
use the smoothed density field only llCole et alJ Il998l) 
and are thus very efficient for big, low-resolution simu- 
lations (e.g. the H ubble Volume simulations described by 
lEvrard et alJ I2QQ2T) . The more subtle approach (ii) uses 
halos identified in DM simulations. Several variations ex- 
ist. In the simplest implementation, halos are populated 
with gala xies according to a given halo occupation dis- 
tribution dPeacock fc Smith! 1200(1 IScoccimarro et al-lliooH: 
iBerlind fc WeinberdlgOoir which may depend on luminos- 
itv <lYair^t^dTl2004) ~faL a more sophisticated approach, 
a Monte Carlo scheme is used to build a merger history 
tree for each halo identified in the simulation, and a semi- 
analytic m odel (SAM) is used to evolve galaxies in t hese 
trees (e.g. iKauffmann et al] 11997k iBenson et alJl200d) . In 
this approach, the SAM is used to generate a physically 
motivated halo occupation distribution. Eventually, a hy- 
brid approach can be used, which extracts halo merging 
history trees from the dark matter simulations, and use a 
SAM to evolve galaxies in these ([Kauffmann et alJ Il999t 
iHellv et al.l l2003t lHatton et alJl200a . hereafter galics i). 
Mock catalogues made using this te chnique have been m ade 
to mimic the CfA redshift sur vey ([Diaferio et al 1 ll999tl or 
the DEEP2 survey JCoil et al.ll200ll) . As a matter of fact, 
all the implementations of the two above methods are closely 
linked because the bias fo rmalism often relies on analysis of 
the S AMs themselves Ce.g. lCole et al.ll998l:ISomerville et alJ 
l200ll) . They must therefore be considered as complementary 
rather than competing. The spirit of the physically moti- 
vated semi-analytic reci pes can be extended to other objects 
such as X-ray clusters IIEvrard et al.ll2002T) . However, these 
approaches have generally been designed to fulfil the need s 
of a specific s urvey (for instanc e 2dF for lCole et alJ lll99Sl) . 
or DEEP2 for lCoil et ail feOOlb l. This limited scope can be 
extended in at least three ways. 

First, it would be interesting to elaborate a generic ap- 
proach to address the construction of mock observing cones 
from the outputs of N-body simulations at various cosmic 
times. The main issue is that, depending on the depth and 
solid angle of the observing cone, the finite size of the box 
may call for box replication along the line-of-sight (hereafter 
radial replication), and box replication perpendicularly to 
the line of sight, at the same cosmic time (hereafter trans- 
verse replication). Wide field, shallow surveys, with negligi- 
ble evolution, can be constructed mainly from a single box 
(the last output of a simulation corresponding to t = to). 
In contrast, deep, pencil-beam surveys generally have to use 
numerous radial replications, whereas they may avoid trans- 
verse replication. Several issues have to be addressed here : 
the effects replication might have on mock catalogues, the 
effect using a finite volume might have on catalogues, and 
the sensitivity of catalogues to the number of time outputs 



of the root simulation. Of course, using a larger box size 
would improve the situation, but given finite computer re- 
sources (CPU time and memory), using a larger box would 
require a trade-off in the mass resolution of the simula- 
tion, which is not acceptable if galaxies are to be mod- 
elled with a sufficient level of realism. A Hubble volume 
would be the ideal situation avoiding any radial or trans- 
verse replication, but so far, the largest volume simulation 
(ACDM with 3000/i _1 Mpc on a side, and 10 9 particles) has 
a par ticle mass of m p = 2.25 x 10 12 h~ x Mq fevrard et alJ 
2002), much too large to address galaxy formation with any 
of the "hybrid models". For instance, the ACDM simula- 
tion used in GALICS I has only 100/t -1 Mpc on a side, and 
m p = 5.51 x 10 9 /i _1 Mq; yet resolution effects are visible 
for galaxies fainter than L*/8 at Z = 0. While we await a 
three orders of magnitude improvement of the simulations, 
addressing the replication issues is unavoidable if one wants 
mimic large-volume observations with high resolution. 

Second, the mock catalogues are useful if they gather 
together a large number of potentially observable properties. 
For instance, it is obvious that a mock catalogue designed 
to prepare and analyse a redshift survey of a magnitude- 
limited sample in a given photometric band, will incorpo- 
rate at least the predicted redshifts and apparent magni- 
tudes in that band. But the redshift survey will also be used 
for follow-up at other wavelengths, and other studies (for 
instance spectral classification once the spectra are properly 
calibrated). A good mock catalogue will be able to provide 
all these pieces of information at wavelength bands different 
from those of the original survey. Ultimately the mock cat- 
alogue will enable the production of field images at many 
wavelengths, making source extraction using the same data 
processing pipeline as the actual observations possible. 

Third, the catalogues quickly become very large, and 
the question of accessibility to relevant information becomes 
crucial. Generally they are made available on Web pages as 
ASCII tables, mostly as galaxy catalogues from snapshots, 
more rarely as galaxy catalogues from observing cones. The 
more realistic these tables try to be (by including many 
galaxies with many properties), the more difficult to read 
and use they become, because of their growing size. The 
solution to this problem is to make the catalogues acces- 
sible through a database that can be queried to make ad 
hoc sub-samples fitting specific needs within a wide range 
of possibilities. 

The purpose of this paper is to contribute along these 
three lines, by (i) presenting a package called momaf (for 
Mock Map Facility) that generates observing cones from the 
outputs of our GALICS model, and (ii) discussing in detail 
the limitations of the method. From these observing cones, 
synthetic catalogues are generated, that can be easily re- 
lated to the catalogues of galaxies in the snapshots. The 
catalogues gather together a large number of properties, in- 
cluding magnitudes in many photometric bands of interest. 
These GALICS/momaf catalogues are made available in an 
on-line database that can be queried through a simple Web 
interface at http : / /galics . iap . f r 

To illustrate our technique, we use examples drawn from 
the ACDM simulation and the GALICS post-processing de- 
scribed in GALICS 1. This simulation is a compromise in terms 
of mass resolution and volume size, and gives a satisfactory 
description of the luminosity functions over typically 5 mag- 
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nitudes. However, the techniques we describe are generic, 
and can be used for larger simulations. In this study, we do 
not address in detail the drawbacks of our model (see GAL- 
ICS i) in terms of mass resolution or limited volume, nor the 
quality of its predictions. We are only interested in how these 
predictions can be converted into mock observations. First 
examples of using these predictions and the mock catalogues 
can be found in Devriendt et al., 200 4 (hereafter GALICS II, 
in preparation), iBlaizot et al] <l2004l . hereafter GALICS ill). 
Two other papers will address issues which are more rele- 
vant to mock images: multi-wavelength faint galaxy counts 
(GALICS iv), and correlation functions (GALICS v). 

This paper is organised as follows. Section 2 summarises 
the main features of the GALICS model that are relevant to 
our study. In section 3, we describe our technique of cata- 
logue and map building from the simulation snapshots. In 
Section 4, we explore the different limitations of our method, 
most of which are actually general enough to apply to other 
mock catalogues in the literature based on the tiling method. 
We explain in section 5 how all the products of GALICS and 
MOMAF are stored in a relational database accessible from 
the web, and illustrate a few key features of this database. 
Section 6 contains a discussion about how these mock cat- 
alogues and images may be used, and presents perspectives 
for further developments. 



2 THE GALICS MODEL 

GALICS (for Galaxies In Cosmological Simulations) is a 
model of hierarchical galaxy formation which combines high 
resolution cosmological simulations to describe the dark 
matter content of the Universe with semi-analytic prescrip- 
tions to deal with the baryonic matter. This hybrid approach 
is fully described in GALICS I and GALICS II and we only 
briefly recall its relevant features here. 

2.1 Dark matter simulation 

The cosmological N-body simulation we refer to through- 
out this paper was do ne using the parallel tree-code devel- 
oped bv iNininl ll999l) . It is a flat cold dark matter model 
with a cosmological constant (Q m = 0.333, Qa ~ 0.667). 
The simulated volume is a cube of side Lbox = lOO/ijogMpc, 
with ftioo = 0.667, containing 256 ,j particles of mass 8.272 x 
10 9 Mq, with a smoothing length of 29.29 kpc. The power 
spectrum was set in agreement with th e present day abun- 
danc e of rich clusters (as = 0.88, from lEke. Cole, fc Frenkl 
Il996l) . and we followed the DM density field from z=35.59 
to z—0, outputting 100 snapshots spaced logarithmically in 
the expansion factor. 

In each snapshot we use a friend-of-friend algorithm to 
identify virialised groups of more than 20 particles, thus set- 
ting the minimum dark matter halo mass to 1.65 x 10 1 M©. 
We compute a set of properties of these halos, including po- 
sition and velocity of the centre of mass, kinetic and poten- 
tial energies, and spin parameter. Then, assuming a density 
profile for the virialised dark matter, we compute the virial 
radius a spherical halo would have to have the same mass 
and potential energy, thus making the link to the idealised 
semi-analytic approach. 

Once all the halos are identified in each snapshot, we 



compute their merging history trees, following the con- 
stituent particles from one output to the next one. The merg- 
ing histories we obtain are by far more complex than in semi- 
analytic approaches as it includes evaporation of halos, frag- 
mentation, and several artifacts due to loose friend-of-friend 
identifications. The way we deal with these is described in 
detail in GALICS I. 



2.2 Baryonic Prescriptions, or how mass turns 
into light 

When a halo is first identified, it is assigned a mass of hot 
gas, assuming a universal baryonic to dark matter mass ratio 
(flf, = 0.045 in our fiducial model). This hot gas is assumed 
to be shock heated to the virial temperature of the halo, and 
in hydrostatic equilibrium within the dark matter potential 
well. The comparison of the cooling time of this gas to its 
free-fall time, as a function of the radius, yields the mass 
of gas that can cool to a central disc during a time-step. 
The size of this exponential disc is given by conservation of 
specific angular momentum during the gas in-fall and scales 
as the spin parameter of the halo. Then, the cooled gas is 
transformed into stars with a rate proportional to its mass 
divided by the disc dynamical time, with a given efficiency. 
The stars formed are distributed in mass according to a n ini- 
tial mass function (IMF) taken from lKennicuttl 119831. The 
stellar population of each galaxy is then evolved between the 
time-steps, using a sub-stepping of at most 1 Myr. During 
each sub-step, stars release gas and metals in the ISM, and 
we follow this gas recycling in time, assuming instantaneous 
mixing. The massive end of the stellar population shortly 
explodes into supernovae which also release metals and en- 
ergy in the ISM or in the IGM. We model this as a function 
of the instantaneous star formation rate. 

When two halos merge, the galaxies they contain are 
gathered within the same final halo and their orbits per- 
turbed. Subsequently, due to dynamical friction or satellite- 
satellite collisions, they can possibly merge. A "new" galaxy 
is then formed (the descendant of the two progenitors) and 
the stars and gas of the progenitors are distributed in three 
components : a disc, a bulge, and a starburst, the amount 
of what goes where being fixed by the ratio of masses of the 
two progenitors. The new galaxy can be elliptical (in shape) 
if the two progenitor galaxies have about the same mass, or 
remain a spiral if one of the merged galaxies has negligible 
mass. 

The spectral energy distributions (SEDs) of our mod- 
elled galaxies are computed by summing the contribution 
of all the stars they contain, according to their age and 
metalicity, both of which we keep track of all along the 
simulation. Then, extinction is computed assuming a ran- 
dom inclination for disc components, and the emission of 
dust is added to the extingu ished stellar spectra with STAR- 
DUST iDevriendt et al]ll999l) . Finally, a mean correction for 
absorption through the inte r galact ic medium (IGM) is im- 
plemented following Madau (1995), before we convolve the 
SEDs with the desired filters in the observer frame. 
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2.3 Resolution effects 

The mass resolution of the DM simulations affect both the 
physical and statistical properties of modelled galaxies in 
the following ways : 

(i) The particle mass of the cosmological simulation sets 
a minimum halo mass. Converting this halo mass into a 
galaxy mass, assuming that all the gas in the halo cools, 
one gets a threshold mass above which our sample of galax- 
ies is complete : the formal completeness limit. Below this 
mass, although we do have galaxies, our sample is not com- 
plete since we miss galaxies in undetected halos. This di- 
rect effect of mass resolution is responsible for the lack of 
dwarf galaxies in the standard GALICS model. To express the 
completeness limit in terms of magnitudes is not straight- 
forward because of the complex processes that convert mass 
into light. One can define a limiting magnitude, at a given 
redshift, such that, say, 95% of the galaxies brighter than 
that will be more massive than the formal mass resolution. 
Because there is no one-to-one relation between mass and 
luminosity, however, the luminosity selection is in practice 
more drastic than the selection on mass. As an example, 
these magnitudes are given in several wave-bands at z — 
and z — 3, in Table Q They can easily be derived for other 
wave-bands or redshifts from the GALICS database (see sec- 
tion |^. 

(ii) In a Universe dominated by cold dark matter, small 
structures form first, and then merge and accrete material 
so as to evolve into larger haloes. In other words, the charac- 
teristic mass M* of the mass distribution of haloes increases 
as redshift decreases. The mass resolution of our numerical 
simulation is fixed, however, and does not allow us to identify 
objects less massive than ~ 1.6 x 10 Mq, at any redshift. 
Hence, going back in time, more and more haloes are not 
resolved, and one eventually reaches a point where no halo 
can be detected. We call zu m the limit redshift when this 
happens. At higher redshifts, we miss all possible galaxies. 
In our simulation, one find zu m ~ 7. 

(iii) A more subtle effect of resolution is that missing 
small structures means missing part of galaxies' histories. 
In practice, we showed that for our standard simulation, 
a galaxy needs to have evolved for about 1 Gyr before its 
properties have converged (see GALICS ill). Although this is 
virtually no constraint at z = 0, where most galaxies are 
much older than 1 Gyr, the constraint becomes drastic at 
z = 3, when the age of the universe is only about 2 Gyr. 
To ease the selection of mature galaxies for users of the 
database (see section [""J, we assign the morphological type 
'Im' to immature galaxies. 



3 MOCK OBSERVATIONS 

In this section, we explain how we convert the outputs of 
GALICS described above into mock observations. We first 
describe the inputs we need from GALICS or any other 
model/simulation of galaxy formation. Then, we show how 
these inputs are turned into mock maps, and point out the 
main limitation of our technique : replication effects. Finally, 
we briefly explain how we can project catalogues onto real- 
istic pre- or post-observing maps. 
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Table 1. 95% and 75% completeness limits in terms of abso- 
lute rest-frame magnitudes at redshifts and 3. At z = 0, the 
magnitudes are expressed in the Vega system, and the filters are 
Johnson's. At z = 3, the magnitud es are expressed in the AB 
system, and the filters are those from lSteidel fc Hamilton! <1993t) . 



3.1 Inputs 

The method we developed to generate mock catalogues from 
outputs (or snapshots) of cosmological simulations at a finite 
number of redshifts is general and can be used for a variety 
of objects (e.g. clusters or quasars). Here, we describe the 
features needed in these snapshots for galaxies. 

The snapshots have to be (cubic) volumes of equal 
comoving size (in our standard simulation, Lt ox = 100 
h~ Mpc) with periodic boundary conditions. These snap- 
shots must each contain the following information : 

• The redshift, or expansion factor of the snapshot. 

• The position of each galaxy within the snapshot. 

• The velocity of each galaxy within the snapshot. 

• The characteristic scale-length of each component of 
each galaxy (disc, bulge and burst). 

• The inclination of each galaxy, which was used to com- 
pute its extinction. 

• The absolute AB magnitude of each galaxy in the de- 
sired filters, computed in the observer frame as 



M VQ (z) = -2.5 log 



-V (l + z) 



[10 pc] 2 

where z is the redshift of the snapshot, and 



L M i +M) = / (1 + z)U(v)L[v(l + z)]du 



2.5 log(l + z) +48.6, (1) 



(2) 



is the luminosity of a galaxy at redshift z, through a nor- 
malised filter response /„ . Note that because the peculiar 
velocities or positions relative to the observer are not known 
at this stage, we use the redshift z of the snapshot to com- 
pute these magnitudes. This approximation will be corrected 
for when we compute apparent magnitudes (see Sec. 13. 2.2*1 . 
Also note that these magnitudes take into account extinc- 
tion by the intergalactic medium, computed at the redshift 
of the snapshot. 

• The first order derivatives of the above magnitudes with 
redshift, in each filter. For galaxies in snapshot i, these 
derivatives are estimated as 



dM _ M[z(i - 1)] - M[z(i)] 
dz z(i — 1) — z(i) 



(3) 
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where z(i) is the redshift of snapshot i (in our convention, 
z(i) < z(i — 1)), and M[z] the observer-frame absolute mag- 
nitude assuming the galaxy is at redshift z (Eq.0. Note that 
this expression does not account for the evolution of galax- 
ies, as it involves the magnitudes of the same galaxy put 
at different redshifts. Eq. |3] however captures K-correction 
and variations of IGM extinction with z, which are the main 
drivers of average variations of apparent properties with red- 
shift in mock catalogues (see Sec. 13.2,21 and H~3.H . 

All the above quantities are direct outputs of GALICS, 
except for positions and velocities. These are computed as a 
post-treatment, using information from the DM simulations 
(positions and velocities of the halos) and from GALICS (or- 
bital radii). The position of a galaxy within a snapshot is 
thus defined by g = h + r or b x u, where h is the position 
of its host halo, r or t the orbital radius of the galaxy, and u 
a normalised vector of random direction. The peculiar ve- 
locity of a galaxy is defined as v g — Vh + Sv, where Vh is 
the peculiar velocity of its host halo, and Sv is the peculiar 
velocity of the galaxy within this halo. The amplitude of Sv 
is drawn randomly from a Gaussian distribution of width 
equal to the circular velocity of the halo, and its direction 
is random. Note that the velocities of central galaxies are 
taken to be that of the centre of mass of their host haloes. 



3.2 Mock Catalogues 

Such inputs, corresponding to the same simulated region of 
universe at different redshifts, will cause replication effects 
when piled in a mock light-cone, namely the regular repeti- 
tion of structures in mock catalogues or images. Transverse 
replications are due to the fact that the same volume, in the 
same state of evolution, is used several times to fill an ob- 
serving cone across the line of sight. Radial replications occur 
because the same volume, although taken at different cosmic 
times, is repeatedly used to fill the observing cone along the 
line of sight. Because the largest structures evolve slowly (i.e. 
over several time-steps), they will create pseudo-periodicity 
in mock catalogues or mock maps. In Fig. (see also Fig. 

we show how replications create an artificial perspective 
effect in catalogues (left hand side panel). Replication effects 
can be suppressed with the "random tiling" method (right 
hand side panel), which we describe here. 



3.2.1 random tiling 

Building a catalogue from the inputs described above con- 
sists of distributing the simulated galaxies in an observ- 
ing cone, and computing their apparent properties in this 
new geometry. First, we define a three-dimensional pave- 
ment of cubic underlying boxes of side L box (= 100 comov- 
ing /i -1 Mpc). Then, we fill the underlying boxes inside the 
light cone with galaxies in the following way : 

• determine the time-steps i = n, n + k which will be 
needed in order to fill the current underlying box, knowing 
that time-step i will be used to fill the light cone between 
[z(i - 1) + z(i)]/2 and [z(i) + z(i + l)]/2; 

• to each of these snapshots, apply the same transforma- 
tion, which is a random combination of the following trans- 
formations : 



Figure 2. Illustration of the cone-making process. On the top 
panel we show the result of a straight-forward tiling of snapshots. 
In this case, the structures (such as the yellow star) appear re- 
peatedly along the line of sight. The middle panel shows the effect 
of the three types of transformation we apply randomly to each 
underlying box. Thanks to rotations, translations, and inversions, 
the underlying boxes are decorrelated one from another. On the 
lower panel we show how it is possible to avoid re-shuffling un- 
derlying boxes to generate a pencil-beam type field. 



- a shift of random amplitude (between and Lbox) in 
each of the three directions (x, y, z), 

- a rotation of 0, n/2, n, or 3-7r/2 around each axis, 

- the inversion of one of the axes picked randomly (e.g. 
x i — > —x), or none; 

• use the transformed positions and velocities of galaxies 
to include them in the light cone and compute their apparent 
properties; 

• move on to the next underlying box, and repeat the 
previous steps until the light cone is filled. 

The first step allows a galaxy in the cone to be taken 
from the output box which has the closest redshift to the 
galaxy's redshift relative to the observer. This has the ad- 
vantage of picking galaxies at a stage of evolution as close 
as possible to that they would have if we had continuous 
outputs. 

In the second step, the shifting, rotating and inverting of 
the underlying boxes is done to suppress replication effects. 
The shuffling of the underlying boxes, outlined with thick 
lines in figure |5] decorrelates them from one another, thus 
suppressing replication effects as well as any information 
on scales larger than the box size. Although breaking the 
continuity of the density field makes us loose a fraction of 
spatial information (see section 14,10 . we chose this solution 
because we have good control on this information loss. 

For deep pencil-beam surveys it is possible to avoid 
replication effects simply by choosing an appropriate line 
of sight so that the light-cone will intersect different regions 
of each underlying box. It is better in this configuration not 
to shuffle the boxes so as to keep all the spatial informa- 
tion. This is an option which is implemented in our code, 
and illustrated on the lower panel of Figure [5] Note however 
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Figure 1. Angular projection of mock catalogues of 5 X 5 square degrees (the "r.a." & "dec." labels are arbitrary labels for two orthogonal 
directions on the mock sky). Each point represents a bright galaxy. The left hand side panel shows a catalogue in which the snapshots 
were piled together without any randomisation. This leads to replication effects, similar to a perspective effect (in an expanding/evolving 
universe). The right hand side panel shows a catalogue containing the same galaxies, which was made using the random tiling method. 
All replication effects have disappeared. 



that, although the density field is continuous throughout the 
cone, clustering information on scales larger than Lt ox is still 
missing, because it is not contained in the DM simulations : 
replication effects can be suppressed but not finite volume 
effects (see Sec. 14.211 . 

It is also possible to chose the position of the observer, 
relative to the first underlying box, in both options. This 
allows to test for cosmic variance on local sources, and is 
useful to understand the statistical significance of the bright 
end of galaxy counts. 



3.2.2 apparent magnitudes 

Because we use a finite number of time-steps, (i) galaxies 
are picked at a cosmic time which is different from that cor- 
responding to their distance in the mock light cone, (ii) the 
SEDs are not convolved with the filters at the exact red- 
shifts, and (iii), IGM extinction is not computed for the cor- 
rect redshifts. Point (i) means that an individual galaxy is 
not taken at the stage of evolution it would have in the case 
of continuous outputs. However, this does not affect the sta- 
tistical properties of the mock catalogues because the overall 
galaxy population does not evolve much between time-steps, 
on average. This issue is also discussed in Sec. I4.3.T1 

Points (ii) and (iii), we correct for as follows. We define 
corrected observer-frame absolute magnitudes as 

M cor = M[z(i)] + ^x[z(d)-z(i)], (4) 

where M[z(i)] is the observer-frame absolute magnitude 
computed by GALICS at redshift z(i) of time-step i (Eq. 0, 
z(d) is the redshift of the galaxy evaluated from its comoving 
distance d to the observer in the mock light cone and taking 
into account the peculiar velocity of the galaxy along the 
line of sight, and AM/dz is defined in Eq.[3] Note that this 
derivative only accounts for distance effects (K-correction 



and IGM extinction) and not evolution (point (i) above). 
The apparent magnitude of a galaxy is then obtained with 
the luminosity distance di, : 

m = Af C or + 51og( -p- ) . (5) 
V 10 P C / 

Thanks to the first order correction of magnitudes, the dis- 
tribution of galaxies in apparent colour-colour plots is con- 
tinuous. This is especially i mportant for colour s elections of 
distant galaxies as shown in lBlaizot et alJ <l2004l) . 

In Fig. |HJ we show an example light co ne with a de- 
tectio n limit close to that of the 2dFGRS (|Colless et alJ 
l200ll) . Each point represents a galaxy with b < 19.5, and 
the colours indicate apparent B — V colour of the galaxies. 

3.3 Mock Maps 

Two types of maps are useful to address different issues : 

• pre-observation maps are a simple projection of a mock 
catalogue on the sky. The only additional assumption re- 
quired here is the functional form for the galaxy light profiles 
(e.g. an exponential disc). 

• post-observation maps include, in addition, realistic 
modelling of the characteristics of the telescope/detector 
combination (e.g. diffraction effects, readout noise, photon 
shot noise). Where appropriate, atmospheric e ffects can also 
be in cluded (e.g. seeing, air glow). SkyMaker ferben et alJ 
l200ll) is a useful tool for producing post observation maps. 

3.3.1 Pre-observation maps 

Consistent with the modelling of galaxies in GALICS, we dis- 
play disc components with an exponential p rofile, and bulges 
and starbursts with a Hernquist profile (Irlerncmisd Il990l . 
equations 32-34). The profiles are truncated at about ten 
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Figure 3. Mock 2dFGRS : each point represents a galaxy brighter 
than b = 19.5 from a light-cone of 75 X 4 square degrees. The cone 
was truncated at a comoving distance of 700/i Mpc, correspond- 
ing to z ~ 0.23. The colours of the points indicate the apparent 
B — V colours of the galaxies according to the colour table in the 
upper-left corner. 

times the component's half-mass radius, and the Hernquist 
profile is also dimmed exponentially starting at five times 
the half-mass radius. To gain speed, we build a series of face- 
on disc and bulge templates on grids of different resolutions 
(e.g. from 2 x 2 to 2048 x 2048 pixels), each one normalised to 
unity. When adding a galaxy's contribution to the final map, 
we chose the template which has a resolution just above that 
of the final map. For bulges, as they are assumed to be spher- 
ically symmetric, the projection is straightforward, and only 
rescaling of the template to the component's size is required. 
For discs, we first have to flatten the template to account 
for inclination 1 , and then to rotate it to map the disc's ori- 
entation. Eventually, we project the transformed templates 
on the final map grid, multiplying each template by the flux 
of the component it represents. The total flux on the final 
map is thus the sum of the fluxes of all galaxies in the light 
cone, except when they are truncated on the border of the 
image. 

Some aliasing effects appear because of the projection 

1 This inclination is the same as that used in GALICS to compute 
the extinction of light by dust. 



of the tilted template grids on the final map, but these 
will be washed out when the map is convolved with a PSF 
afterwards. Since the aim of this tool is to produce pre- 
observation maps (with a resolution that should be higher 
than the final post-observation map), there is not much 
point in correcting this effect via bilinear interpolation or 
other CPU-expensive methods. 

Note that the images produced this way are not limited 
in magnitude (up to the resolution limit of the simulation) 
and include the contributions of all galaxies in the cone. It 
is important that all sources are added (even though some 
may be fainter than the detection limit) for estimating the 
background intensity. This is particularly relevant to far in- 
frared or sub-millimetre surveys which are limited by confu- 
sion and where the background contains a good part of the 
information. 

Example mock maps are shown in Fig. 2] The left hand 
side panel shows an optical view (R band) of a 3 x 3 square 
arcmin field, and the right hand side panel shows the far IR 
view of the same field (at 170/im). This latter image was 
convolved with a Gaussian PSF of width 10 arcsec to mimic 
an observation by the PACS instrument on-board Herschel. 
No noise was added to these mock maps. 

Finally, note that MOMAF allows to generate all-sky 
maps, using the HEALPix pixelisation jGorski et al.l 2002) 
chosen by the Planck consortium. 

3.3.2 Post-observation maps 

It is considerably more difficult to generate post observ- 
ing maps because separate modelling is required for each 
telescope/detector combination. MOMAF is designed to feed 
Instrument Numerical Simulators (INS) with realistic cat- 
alogues or pre-observation maps. In the optical and near- 
infrared domain, a ready general tool for post observing map 
generation is available in Skymaker dErben et al 1 l200lD . We 
briefly discuss this general INS here as an example of MOMAF 
possibilities. 

Skymaker is an image simulation program, originally 
designed to a ssess SExtractor detecti on and measurement 
performances feertin fc Arnoutsll99(jl . The code (currently 
at version 2.3.4) has been much improved since. It is capable 
of simulating star and galaxy images with high level of accu- 
racy. Galaxies in Skymaker are modelled as a combination 
of a de Vaucouleurs bulge and an exponential disk. Vari- 
ous sources of noise and convolution with the Point Spread 
Function can be included as desired. 

There are two input files required for Skymaker to gen- 
erate an image. One is a configuration file specifying the 
characteristics of telescope and detector and the seeing con- 
ditions. The second is the source list, which can include stars 
and galaxies. A typical line for a galaxy in a source list for 
Skymaker includes the "total" magnitude, the bulge-to-total 
luminosity ratio, bulge equivalent-radius in arc-second, pro- 
jected bulge aspect ratio, bulge position angle in degrees, 
disk scale length in arc-second, disk aspect ratio and disk 
position angle in degrees. All of these are natural outputs of 
GALICS/momaf, as discussed in the previous section. There 
is no provision for adding starburst components in Sky- 
maker. We use a workaround for such cases. We add the 
burst components from GALICS as additional bulges, with a 
scale-length obtained from GALICS, a bulge-to-total luminos- 



8 Blaizot et al. 




Figure 4. Left hand side panel : mock map of a 3 X 3 arcmin 2 field, in the R band. Right hand side panel : mock map of a 1 deg 2 field, 
at I70fim, convolved with a Gaussian PSF of width 10 arcsec. The central square in this image outlines the field shown in the optical in 
the left-hand side panel. 



ity ratio of 1.0 and the appropriate starburst magnitude as 
the "total" magnitude. 



4 LIMITATIONS OF THE METHOD 

Ideally, one should build a mock catalogue from a sim- 
ulated volume much larger than the light-cone, and out- 
put smoothly the physics by propagating photons towards 
the observer through the expanding simulated universe. Al- 
though such simulations are becoming feasible today, their 
computational cost is still prohibitive. The method we pro- 
pose with MOMAF stems from the same philosophy as GALICS 
and consists of extracting as much information as possible 
from a given simulation, and use that to build realistic cat- 
alogues at relatively low computational expense. Of course, 
however sophisticated the method we use, several limita- 
tions appear in MOMAF mock catalogues because they are 
built from the replication of finite information. The purpose 
of this section is to understand how the replication process 
affects our predictions. 

The most important limitations of MOMAF result from 
the fact that we use a finite volume to describe the whole 
Universe. In order to do so, we have to replicate the simu- 
lated volume many times along and across the line of sight. 
Now, because we use the random tiling method to proceed 
with these replications, some clustering information is lost. 
This results in a negative random tiling bias which is dis- 
cussed in Sec. 14.11 A more subtle effect comes from the 
fact that the finite volume of the simulation used to build 
a mock catalogue does not describe density fluctuations on 
large scales. Thus these fluctuations will be missing from the 
mock catalogues. This results in biases on counts variance 
estimates and correlation function estimates. These finite 
volume effects are described in Sec. 14.21 



In this section, we also check that other possible effects 
are under control. In Sec. 14. 3. II we investigate the effect of fi- 
nite timestep on the apparent properties of galaxies in mock 
catalogues. In Sec. 14.3.21 we check the impact of mass resolu- 
tion of the root simulation on different observable statistics. 

4.1 Random tiling bias 

A negative bias on correlation functions is introduced in 
mock catalogues by the random tiling approach, which 
comes from the fact that we decorrelate pairs of galaxies 
from one underlying box to the other when re-shuffling them 
to suppress (periodic) replication effects. Here, we first esti- 
mate this bias on the spatial two-point correlation function, 
and then project the results to derive the bias on the angular 
correlation function. 

4-1.1 Spatial correlation function (SCF) 

The spatial correlation function (SCF) can computed by 
measuring the number of pairs of objects separated by a 
given distance. If one uses the estimator of lLandv fc Szalavl 
il993l hereafter LS93) : 

_ DD(r) - 2DR(r) + RR(r) 
^ ' RR(r) ' W 

where DD(r), DR(r) and RR(r) are the number of data- 
data, data-random and random-random pairs separated by 
distance r respectively, one gets, through the logarithmic 
differentiation, the relative error on £(r) : 

SJir) 8DD(r) 

£(r) DD(r) ' { ' 

because the random sample is not affected by the replication 
technique. 
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Now, take a galaxy near the edge of an underlying box 
(say at a distance d < r from the edge). The mean number of 
pairs that will be missed for this galaxy, due to replication, 
is : 



8DD g {d,r) = 2vrr <(r)(l 



d/r)dr 



(8) 



where n is the mean density of galaxies, and the subscript g 
denotes that DD is the number of pairs lost for one galaxy. 
The tilde over DD denotes that we only consider pair loss 
through one side of the box. To compute the loss of pairs due 
to one side of the underlying box, we integrate the previous 
equation over d, from to 2r, namely : 



SDD(r) 



2nr 2 n£(r)dr I nL 2 ox (l — d/r)dd 

Jd=0 

2nr 3 Ll ox n 2 £(r)dr. 



(9) 



Note that we neglected corner effects here. This is justified 
by the fact that one should always consider separations much 
smaller than the size of the box (i.e. r <C Lh ox ). 

To get the pair loss over a whole box, simply multiply 
the previous result by 6 (the number of sides) : 



SDD(r) 



12nr 3 LL 



.n £(r)dr. 



(10) 



Had we not broken the continuity in the density field 
between each underlying box, the number of pairs would 
simply be, for a whole box : 



DD(r) = 4nr drn(l + £(r)) x nLr b 



(11) 



where the first right hand term is the number of pairs ex- 
pected for one galaxy, and the second right hand term is the 
number of pairs in the box. The relative error in the number 
of pairs, due to shifting the underlying boxes is thus : 



5DD(r) r £(r) 

DD(r) L^T+W)' 



(12) 



For a numerical estimate, consider the Lyman break 
galaxies (LBG) population. For these galaxies, we expect 
£(r) = 1 at r ~ 6 /i^Mpc. Thus, for our simulation, with 
L hox = 100 h^Mpc, one finds <5£/£ < 10% for LBGs, at 
6 /XMpc. 

On figure |S] we show the theoretical underestimation 
on spatial correlation function measurements from our cat- 
alogues. In the plots, we assume a correlation of the form 
£(r) = (r/ro) 7 , and we let ro and 7 vary. For a wide range 
of these two parameters, the error due to transforming the 
underlying boxes is less than 10% from 1 to 10 /i" 1 Mpc. 

On figure HJ we show a measure of the bias on £ intro- 
duced by randomising the boxes. To do this, we cut our snap- 
shot volume into 8 3 sub-boxes to which we applied transla- 
tions, rotations and inversions as described above. We then 
measured £(r) on the original snapshot (solid curve) and on 
the shuffled snapshot (dashed line). We plotted on figure 
Uj]the prediction for the bias according to the above calcu- 
lation as the dot-dashed line. The agreement between the 
measurement and the analytical prediction is very good on 
scales up to ~ l/5Lt ox (with Lt ox being 100/8ft _1 Mpc, as 
shown by the vertical line on figure [fjj. In the previous sec- 
tion, we showed that this scale is where finite volume effects 
come into play. Although most finite volume effects are not 
present here because we use all the sub-boxes to fill the 
simulated volume, the randomisation of sub-boxes kills any 
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Figure 5. Expected relative underestimation on the spatial cor- 
relation function, assuming £(r) = (r/ro)' 1 '. The left hand side 
panel shows variations with ro, with ro = 5, 6, 7, 8, 9, 10 h _1 Mpc 
from the bottom curve to the top curve, 7 being fixed to —1.8. 
The right hand side panel shows the dependence on 7, with 7 
spanning the range [-2,-1], from bottom to top, ro being fixed to 
7/i Mpc. On each panel, the right hand side vertical line shows 
the approximate upper limit of validity of measurements of £ (one 
tenth of Lf, ox ). The left hand side vertical line roughly indicates 
the size of a cluster, below which our spatial information is un- 
certain. 
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Figure 6. Measured spatial correlation function of the dark mat- 
ter particles in the 2 = snapshot (solid curve) and on the same 
snapshot re-shufflcd (dashed line). Also plotted is the prediction 
from Eq.[T2](dot-dashed line). The error bars attached to the solid 
line are Poissonian errors. The vertical line indicates the size of 
sub-boxes used for the re-shuffling. 
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signal at scales larger than a sub-box. This is not described 
by the above analytic calculation and results in the sharp 
cutoff of £ at ~ l/5Lbox- 



4-1.2 Angular correlation function (ACF) 

We can use the bias on the SCF derived above to evaluate 
that on the ACF. Let's first remember that, in the small 
angle approximati on, the angular correlat ion is related to 
the spatial one by teernardeau e t al.l l2002lh 



w{9) 



d X X 4 £m(x)V 2 (x) / et(r)dx 



(13) 



where \ is the radial distance, Dm is the angular distance 
[Dm = X m a na t universe), ip is a selection function sat- 
isfying J X 2 V'(x)dx = 1) an( l r is the separation distance, 
related to the angular separation 9 and the integration vari- 
able x through the relation r = Dm9(1 + x 2 ) 1 ^ 2 . If we now 
introduce the bias on £ as £ i— > £ + <5£, with St; given by Eq. 
1121 . we can derive the corresponding bias on the ACF from 
Eq.GU: 



Sw(9) 



dx X D M {x)^ (x) 



W 



e(r) 



Lbox 1 + f(r) 



dx (14) 



Assuming that £ can be written as the power law (r/ro) 7 , 
and using Dm(x) — X m a na t universe, we end up with : 



Sw(9) 



3rJ9 



(15) 



dxX 6 "V(x) 



(l + a 2)V2-7 da; 

(^) 7 + (l+^)"7/2 



Note that for this result, we also assumed that the SCF 
does not vary with redshift. This is obviously wrong in gen- 
eral but is justified if the selection function ip is narrow 
enough (e.g. for LBGs). 

Finally, using equations 1131 and 1151 , and deciding on a 
selection function, one can compute numerically the relative 
bias induced on ACF measurements by the transformations 
of underlying boxes. An example is given in figure Q 



4.2 Finite volume effects 

Several limitations arise because we use a finite volume to 
describe the whole Universe. They are basically due to the 
fact that a finite volume V does not describe density fluc- 
tuations on scales typically larger than ~ V 1 ' 3 , In other 
words, although the mean number of galaxies in a simula- 
tion can be tuned to fit that observed in the Universe, the 
simulation does not describe the dispersion about this mean 
value. How this affects statistics from our catalogues is the 
question we address in this section. The simplest statistic 
we are interested in is galaxy counts, as a function of mag- 
nitude or redshift. Mock catalogues can be used in two ways 
: (i) to normalise models, and (ii) to estimate errors (in- 
cluding cosmic variance). In Sec. 14. 2. Tl we discuss how finite 
volume affects both the counts and their variance. Then, 
in Sec. 14.2.21 we describe the bias on correlation functions 
introduced by finite volume effects. 
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Figure 7. Expected relative underestimation on the angular cor- 
relation function, assuming £(r) = (r/ro)' 1 '. The left hand side 
panel shows variations with ro, with ro = 5, 6, 7, 8, 9, 10 h _1 Mpc 
from the bottom curve to the top curve, 7 being fixed to —1.8. 
The right hand side panel shows the dependence on 7, with 7 
spanning the range [-2,-1], from bottom to top, ro being fixed to 
7h~ ^^Mpc. The selection function chosen here is simply a top hat 
centred at \ = 2000/i _1 Mpc and of width 500/i _1 Mpc. 



4-2.1 Effects on estimates of counts variance 

Two variances are relevant for counts in mock catalogues. 
The first is the variance which tells us about the dispersion 
of number counts from mock catalogues each generated from 
a different simulated volume. The second is the variance that 
describes the dispersion in number counts from mock cata- 
logues made from a unique simulated volume. This variance 
tells us to what extent we can estimate cosmic variance with 
mock catalogues based on a given simulation. Let us proceed 
to virtual experiments to understand these quantities. 

• Imagine we have a large number N of simulations at 
hand, all describing an equal volume V, but with initial con- 
ditions drawn from a much larger volume. From each sim- 
ulation, we build a mock catalogue using momaf, and then 
count galaxies brighter than some magnitude limit. Finally, 
we measure the variance a\ of the counts obtained in this 
way. Now, imagine that we also have N mock catalogues, 
each generated using an ideal technique and a simulated 
volume much larger than that of the light-cone. Call a\ the 
variance in the counts measured from these catalogues. In 
the case where the volume of the cone is much smaller than 
volume V, the two above variances will be equal. In the more 
realistic case where V is smaller than the volume of the cone, 
one will measure that a\ > <72 : replication enhances the bias 
of the simulated volume, thus dispersing more counts from 
catalogues. 

• As a second experiment, imagine one has a unique sim- 
ulated volume V as above, and builds many mock catalogues 
from it. These catalogues will be different from one another 
because of the random tiling process and because the light- 
cone may intersect different sections of V in different realisa- 
tions. As before, measure the variance a 2 of the counts from 
these catalogues. Again, if the simulated volume is much 
larger than that of the light-cone, one will measure a 2 . ~ a\. 
In this case, one can use mock catalogues to estimate cos- 
mic variance. However, in the case where the volume of the 
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Sample 


Area 


Selection criteria 




<C i ^> (imin j imax*) 


6»ioo (deg) 


Aioo 


APM 


100 deg 2 


17 < Bj < 20 


0.18 (0.04 ; 0.73) 


65.3 (54 ; 69) 


~ 10 


- 5.5 


K20 


1 deg 2 


K a < 20 


0.53 (0.02 ; 1.74) 


58.0 (41 ; 69) 


~ 4.5 


~ 12 


Counts 


1 deg 2 


X < 5500h- x Mpc 











Table 2. Geometry of mock catalogues used for comparison to data from the APM and the K20 (see text), and to make Ka-band counts. 
All mock catalogues have a square surface, of area given in the second column. The third column states how galaxies are selected in 
these catalogues (in the "Counts" case, no photometric selection is applied, but the catalogues are truncated at a comoving distance 
of 5500/i —1 Mpc from the observer). The fourth column gives the mean and span of the redshift distribution. The fifth column gives 
the mean and range of outputs used (output 70 is z = 0). The sixth column gives the angular size of the simulated volume (of side 
L b = 100/i _1 Mpc) at the mean redshift of the sample. The last column gives the number of radial replications needed to reach the mean 
redshift using the full simulated volume. 



cone is larger than V, 03 will be found lower than (T2, be- 
cause replications do not add large-scale fluctuations. In the 
extreme case where the cone is very large compared to the 
simulated volume, 03 will tend to zero, because the cone 
encloses all the information contained in V. 



In Fig. |H| we show K-band counts measured from var- 
ious mock catalogues having the geometry defined in the 
third line of Table "counts" cat alogues). The shaded area 
shows the locus of K20 counts from lCimatti et all (l2002f) , in- 
cluding Poissonian error bars. The filled symbols and their 
error bars give the mean and standard deviation for counts 
measured from 20 mock light-cones made using the standard 
simulated volume. Then we cut our root simulation into 125 
sub-boxes, and made a mock catalogue out of each sub-box. 
The open diamonds give the mean and standard deviation 
of counts measured from these 125 mock light-cones. Finally 
the open triangles give the mean and standard deviation of 
counts measured from 20 mock catalogues made from a sin- 
gle sub-box. The upper panel of Fig.|5]compares the relative 
standard deviations of these three measures. 

First, note that changing the size of the volume used 
to make mock catalogues does not change the shape of the 
counts, but only the amplitude. This was expected since evo- 
lution is the same in all sub-boxes. This tells us that if our 
root simulation is well normalised, counts from mock cata- 
logues are not sensitive to the size of the simulated volume 
and can thus be used with confidence to normalise theory 
to observations. 

Second, consider the difference between open and filled 
triangles in the upper panel of Fig. |5] that is, estimates of 
cosmic variance from cones made using one sub-box or the 
full simulated volume. As expected, we find that using a 
small box leads to an under-estimate of the variance. Ta- 
ble |5j tells us that the number of full boxes used to describe 
galaxies brighter than K s = 20 is about 12 along the line of 
sight (up to the median redshift) and 1/5 across the line of 
sight at the median redshift. This situation is at the limit 
where we can correctly estimate cosmic variance, since the 
light-cone only intersects a fraction of the box volume in 
each underlying box. For the sub-box case, the light-cones 
include a full underlying box at the median redshift, and 
about 60 sub-boxes are replicated along the line of sight to 
reach this redshift. In this regime, the angular correlation 
function is largely under-estimated at the scale of the cata- 
logue (because it is larger than the scale of a box), and so 
the estimated cosmic variance is under-estimated too. 




Figure 8. Effect of finite volume on A"-band counts. Lower panel : 
the filled (respectively open) triangles show the mean number 
counts obtained from 20 catalogues made from our standard sim- 
ulation (resp. from one sub-box), the error bars giving the stan- 
dard deviation about this mean. The open diamonds show the 
mean number counts measured from 125 mock catalogues, each 
built from a different s ub-box. The shaded area shows the locus 
of the K20 counts from lCimatti et alj |2002|> . Upper panel : rela- 
tive standard deviation of the counts (same symbol code as lower 
panel). 



Third, it is interesting to consider the difference be- 
tween the filled triangles and the open diamonds. At bright 
magnitudes, the two give the same variance. This is because 
the volume probed by the mock catalogues is much smaller 
than the volume of a sub-box, so variance is well estimated 
with both methods, and is in fact Poissonian. At interme- 
diate magnitudes, the volume probed by the mock cone is 
smaller than the full simulated volume, yet larger than that 
of a sub-box. Hence, the sub-box variance saturates at higher 
values. At the faint end, the light-cone is larger than the full 
simulated volume, so the variance showed with filled trian- 
gles suffers from a similar negative bias as that shown by 
the open triangles. The three regimes are spanned here and 
show that in practice, robust estimates of cosmic variance 
require a simulated volume much larger than the volume 
probed by the mock catalogue. 

Redshift distributions are affected by finite volume ef- 
fects in two ways. First, the variance and mean of the red- 
shift distributions will change with box size. This effect is the 
same as that described above for the counts. Second, because 
the smaller the box, the more replications involved, repeated 
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Figure 9. Effect of finite volume on the redshift distribution of 
a flux-limited sample. The solid (resp. dashed) curve shows the 
mean redshift distribution estimated from 20 mock catalogues 
made from the full simulated volume (resp. 125 mock catalogues 
each made from one different sub-box). The attached error bars 
give the standard deviations (the small ones correspond to the 
solid curve). The upper panel gives the differential distributions 
while the lower panel gives the normalised cumulative distribu- 
tions. The shaded area and dotted line show the locus of the K20 
data in the top and bottom panels (see text). The agreement with 
observations is very good. 

structures may imprint periodic features in N(z). Thanks 
to the random tiling technique, this problem is avoided. In 
Fig. |5| we show the differential (upper panel) and cumula- 
tive (lower panel) re dshift distributions o f galaxies selected 
as in the K20 survey ijCimatti et al]l2002l ). The shaded area 
(resp. dotted curve) indicates the locus of the data in the 
upper panel (resp. lower panel). In both panels, the curves 
(which are mostly over-imposed) show the mean distribu- 
tions measured from 20 mock catalogues made with the full 
simulated volume and from 125 mock catalogues each made 
with a different sub-box. The error bars show the standard 
deviations about these means (the larger error bars corre- 
spond to the sub-box catalogues). Fig.|5|shows that the red- 
shift distribution does not change in shape when the size 
of the box varies. Although the details of N(z) will differ 
from one catalogue to the other, the statistical significance 
of redshift distributions is thus robust, and found to be in 
good agreement with if-band observations. 

The agreement found with the K20 redshift distribu- 
tion is an important success of the GALICS model, given 
the difficulty experienced by other models in achieving this 
task. The redshift d istribution observed in the 2dFGRS (e.g. 
IColless et aDl200lD also seems to have been challenging for 
modellers to reproduce, and it is interesting to see how GAL- 
ICS and MOMAF pass this test. The shaded area in Fig. 1101 
shows the redshift distribution of 2dF galaxies given by 
IColless et al] ^2001). This distribution includes the whole 
survey, and thus corresponds to a nominal magnitude cut 
at bj — 19.45. Because of various sources of incompleteness, 
however, the effective magnitude cut is more likely to lie 
around bj = 19.3 (see Fig. 14 from lColless et al.ll200ll) . The 
solid histogram in Fig. I1UI shows the average redshift dis- 
tribution measured in 20 mock surveys of 10x75 square de- 
grees, limited in apparent magnitude at bj = 19.3. The asso- 



Figure 10. Comparison of GALICS to the 2dFGRS reds h ift dis - 
tribution. The shaded shows data from IColless et al] J200ll) . 
The solid histogram (resp. lower and upper dashed histograms) 
shows the GALICS redshift distribution for galaxies brighter than 
bj = 19.3 (resp. 19.2, 19.4) estimated from 20 mock surveys of 
10x75 square degrees. The error bars show the dispersion in the 
estimates. The same arbitrary normalisation was applied to the 
3 histograms. 



ciated error bars show the dispersion around this mean. The 
dashed lines show the redshift distributions corresponding 
to an apparent magnitude cut at bj = 19.2 (lower line) and 
b.j = 19.4 (upper line). The comparison suggests that the 
evolution of the foj-band luminosity function predicted by 
GALICS is incorrect, giving too many bright galaxies at high 
redshifts. Indeed, an apparent magnitude cut at bj ~ 19.1 is 
necessary to bring our redshift distribution into better agree- 
ment with the 2dFGRS results. Let's note however that the 
scope of this comparison is limited in several ways. First, one 
should include a proper description of the complex selection 
function of the 2dFGRS for a more meaningful comparison. 
Although beyond the scope of this paper, this is readily fea- 
sible by applying the masks of the 2dFGRS to MOMAF mocks 
with similar geometry. Second, the dispersion showed in Fig. 
HOI tells us that despite the huge amount of data gathered by 
the 2dFGRS, cosmic variance is still quite large. Following 
the above discussion on finite volume effects, and looking 
back at Fig. [3] one sees that this dispersion is bound to be 
an under-estimate of the true cosmic variance because of 
the many replications involved in 2dFGRS-like mock sur- 
veys. This tells us that we need a bigger simulated volume 
to actually constrain the model : one needs to have realistic 
cosmic variance at a survey's size before hoping to discrim- 
inate between different models. Finally, this example shows 
how useful the MOMAF software is to carry out detailed com- 
parisons of models with various datasets. 



4-2.2 Effects on estimates of 2-point correlation functions 

Finite volume effects alter correlation functions in a complex 
way. Let's first discuss what happens to the spatial correla- 
tion function (SCF) in a cubical box such as the simulated 
volume. The situation in mock catalogues is analogous but 
also includes projection effects. Following LS93 we relate the 
correlation function £ contained in the simulated volume V 
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to the "real" £ as 

1 + i = TTT- (16) 

At small separations (compared to V 1 ^ 3 ), where f <C £, the 
bias is negligible. At large scales, £ ~ £ and £ falls down to 
0. This bias directly results from the fact that the variance 
cannot be estimated properly at the simulated volume scale, 
from only one simulated volume. Let's carry out numerical 
tests to better understand the finite volume bias on the spa- 
tial correlation function. We again cut our standard simula- 
tion of side Lb = 100fo -1 Mpc into 125 cubic sub-boxes of side 
L sb = 20h~ 1 Mpc, and we measure the spatial correlation 
function (SCF) in all these 126 boxes, for galaxies brighter 
than B = — 19. This magnitude cut leaves us with about 150 
galaxies per sub-box. In Fig. 1111 we plot the SCF measured 
from the full simulation (£ioo) with diamonds, and the aver- 
age of the 125 measures on sub-boxes ((£20) = /, $2o/125) 
as stars. The error bars attached to the stars show the stan- 
dard deviation from the 125 estimates of £20- Comparison 
of £100 and (£20) shows that finite volume effects translate 
into a negative bias at all scales, with a rather sharp cutoff 
at r ~ L 3 b/5. The dashed line shows (£20) corrected from 
the integral constraint given in Eq. 1161 The agreement of 
the dashed line with the diamonds is very good at large 
scales. At separations smaller than r ~ lft -1 Mpc, sub-box- 
to-sub-box fluctuations (both due to sparse sampling and to 
clustering) are responsible for the remnant discreteness bias. 
To understand this, let's consider the pair-weighted average 
of £20 : 

J2i n U20,i 



£20 



(17) 



where £20,1 is the correlation function measured (with the 
estimator from LS93) on the rii galaxies of sub-box i. This 
weighted average is shown with the triangles on Fig. 1111 
Notice that in the case where edge effects are negligible (i.e. 
at small separations), one finds 



6 



£ DD t - 2DR t + RRi DD - 2DR + RR 



RR 



(18) 



where DD, DR, and RR are the numbers of data-data, 
data-random, and random-random pairs in a given sepa- 
ration bin for the whole simulated box. In other words, the 
pair- weighted average of the correlation functions of the sub- 
boxes is equivalent, at small scales, to using the LS93 esti- 
mator for the whole box, and is thus only affected by the 
integral constraint. Now, the main difference between this 
estimate and the estimate obtained from the full simulation 
(open diamonds) is that cross pairs between two sub-boxes 
are not regarded. In particular, as expected, this estima- 
tor converges to the biased estimator (£20) at large scales. 
But at small scales, £20 partly captures sub-box-to-sub-box 
fluctuations through the variations of nf, and thus remains 
above (£20). Still, £20 is a combination of estimates of the 
SCF on small boxes, which are contaminated at all scales by 
the integral constraint effect. Hence £20 remains below the 
"exact" result. When £20 is corrected from the integral con- 
straint as in Eg. 1 161 (solid line in Fig. lilt , the result matches 
nearly perfectly £100, as expected. 

Fig. 1121 shows how these finite volume effects affect the 
angular correlation function. The solid line shows the mean 
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Figure 11. Spatial correlation function of galaxies brighter than 
B = —19. The diamonds show £100, computed from the whole 
simulation, and the stars show the mean of estimates of £20 from 
125 sub-boxes of side L 3 f, = 20h~ 1 Mpc (stars). The error bars 
show the standard deviation about this mean. The dashed line 
shows £20 corrected for finite volume effects (see text), and agrees 
with the diamonds at large separations. 



of angular correlation functions measured from 20 mock 
APM catalogues (see Tab. made from the full simulated 
volume, the attached error bars give the measured standard 
deviation. The dashed line and corresponding error bars 
show the mean and standard deviation of the ACF mea- 
sured from the 125 mock APM catalogues made from the 
sub-boxes. 

For the dashed line, the departure from a power-law at 
large scales reported in the 3-D case (see Fig. [TTJ occurs 
here at 9 ~ 0.4 degree. This is a direct consequence of the 
finite volume of the sub-box, and 0.4 degree is here about 
one fifth of the angular size of a sub-box at the median red- 
shift of the survey (see Tabled - On top of this turn-around, 
there is an overall bias which increases slowly with separa- 
tion starting at scales of about one hundredth the size of a 
sub-box. This is due to the projection of the bias described 
above for the SCF. Now, the open dia monds on Fi g, ll2l show 
the ACF measured from the APM bv lMaddox et alJ 11996). 
The data are in very good agreement with our (full-box) 
model at scales shorter than ~ 0.1 degree - which is about 
Lb/100 at the median redshift. Long- wards of this scale, fi- 
nite volume bias our ACF progressively. The comparison of 
our full-box w{9) to data from the APM is similar to the 
above comparison between sub-box and full-box ACFs. We 
thus understand that the large-scale disagreement between 
APM and our model is not physical, but due to finite vol- 
ume effects : APM data are drawn from an even larger box 
: the Universe ! 

Finally, let us come back to the issue of counts' vari- 
ance in mock catalogues. Remembering that the variance of 
counts is basically given by the average of the angular corre- 
lation over the survey, we now clearly see how 0-3 of previous 
section was under-estimated. And we understand that this 
under-estimation will occur unless we use a simulated vol- 
ume more than ten times larger than the aperture of the 
light-cone at the redshift of interest. 
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Figure 12. The solid (resp. dashed) curve show the mean ACF 
estimated from 20 mock catalogues made from the full simulated 
volume (resp. 125 mock catalogues each made from one different 
sub-box). The error bars give the standard deviations about these 
means. The cutoff in the sub-box-catalogue ACF occurs at about 
a fifth of the angular size of a sub-box taken at the mean redshift 
of the galaxy s ample. Open diamon ds show the ACF measured 
from the APM iMaddox et al.lll99fifl . 



4.3 Other effects 

4.3.1 Timestep 

The fact that we use a finite number of outputs, typically 
spaced in time by 100 Myr, could affect mock catalogues. We 
argued in Sec. l3.2.2*l that this was not expected because even 
though individual galaxies can undergo a dramatic evolution 
during a timestep, the average properties (and their dis- 
persion) of the overall population evolves at a much slower 
pace. Nevertheless, we check this hypothesis in this section 
by comparing statistics from mock catalogues made using 
different timesteps of the same simulation. Namely, we com- 
pare the counts, redshift distributions and ACFs obtained 
with our reference mock catalogues to those obtained with 
catalogues made using one snapshot out of ten 2 . 

The resulting counts, redshift distributions and ACFs 
are shown in Fig. 1131 and show no significant difference 
between the fine and coarse time-steps. This shows that 
the random tiling method is robust in that the resulting 
mock catalogues do not depend on the time-step used in 
the root simulation, provided the physics was properly in- 
tegrated. The fact that the properties of mock catalogues 
do not change with time-step shows that, at least for the 
selected galaxies, the K-correction and possibly a slow evo- 
lution determine the statistics. This justifies a posteriori the 
first-order correction made for magnitudes in Eq. [I] 

The necessity of using a fine time-step to make mock 
observations then mainly comes from the complex analysis 
that can be made from them (see lBlaizot et all2004l) . In this 
perspective, one wants to retrieve the physical properties of 
individual galaxies, as well as their hierarchical evolution, 



2 Note that in any case, the properties of the galaxies were 
computed using all timesteps, which is necessary in order to 
prope rly describes the physics at stake in galaxy evolution (see 
iHatton etal]|2003t) . 



for samples selected according to observational criteria. A 
short time-step naturally allows to analyse evolution in more 
detail. 



4-3.2 Mass resolution 

As mentioned in Sec. 12.31 the mass resolution of the DM 
simulation affects galaxies in three ways : (i) incompleteness, 
(ii) limit redshift, and (iii) "immaturity" . These three limita- 
tions, inherent to the hybrid method implemented in GALICS, 
will have different effects on statistics measured from mock 
catalogues. 

(i) Incompleteness sets in when a fraction of galaxies of 
a given sample are missed because they would lie in halos 
below the mass resolution of the DM simulation. This ef- 
fect obviously causes under-estimates of the counts at faint 
magnitudes. A more subtle effect is that a sample of galaxies 
affected by incompleteness will have a halo mass distribution 
biased towards high masses. Because more massive halos are 
more clustered, this will induce a positive bias on correlation 
functions. These effects cannot be corrected for except by 
using si mulations with better mass resolution. However, as 
shown in Blai zot et alJ ((2004) , it does not prevent one from 
using mock catalogues for studying bright galaxies, even at 
high redshift. 

(ii) The limit redshift is the redshift beyond which no halo 
can be detected in a DM simulation. All possible galaxies at 
higher redshifts are thus missed by GALICS, and are hence 
missing from our mock catalogues. This effect, combined 
with incompleteness is responsible for a faint-end decrease 
in the counts. 

(iii) Immaturity describes the fact that young galaxies have 
unrealistic properties mainly because the cooling of gas in 
their host haloes was not slowed down by DM accretion 
in sub-resolution progenitors. These galaxies only become a 
significant part of the overall population at redshifts higher 
than ~ 2 in our standard simulation, and they can be easily 
flagged and removed from a sample from our database. In 
terms of apparent magnitudes, they only significantly affect 
K band counts at K > 24. 

The natural, and foreseen, solution to these limitations 
is to increase the resolution of root DM simulations. We 
come back to this perspective in the conclusions. 



5 DATABASE AND WEB INTERFACE 

The current implementation of the GALICS hybrid model 
of hierarchical galaxy formation is a package that includes 
three main routines (see Fig. 1141 . First, HaloMaker identifies 
haloes in each of the output snapshots. Second, TreeMaker 
constructs the halo merging history trees from the list of ha- 
los in all the snapshots, and computes the dark matter prop- 
erties of each of the halos. Third, GalaxyMaker deals with 
the fate of baryons within the merging history trees. It com- 
putes the properties of hot gas in halos, and follows galaxy 
formation and evolution. The outputs are a list of proper- 
ties (including absolute magnitudes in standard photomet- 
ric bands) and rest-frame spectra for all galaxies in snap- 
shots. The information produced by a given GALICS post- 
processing of the simulation (defined by the choice of the 
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Figure 13. Le/i /land side panel : number counts from a mock catalogue using all available time-steps (solid curve) and one snapshot 
out of 10 (dashed curve). Middle panel : redshift distributions using all snapshots (solid histograms) or one out of 10 (dashed histogram). 
The data from the K20 survey are shown by the grey area in the upper panel and by the dotted line in the lower panel. Right hand side 
panel : angular correlation functions in catalogues containing all the snapshots (solid curve), one out of 2 (dashed curve), one out of 5 
(dot-dashed curve) and one out of 10 (3-dot-dashed curve). 
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astrophysical free parameters) constitutes what we hereafter 
call the Archives of this post-processing (see GALICS i). Any 
change in the list of input parameters will correspond to a 
new issue of the post-processing and new output Archives. 

The MOMAF post-processing is also a package with three 
main routines, detailed in previous sections (see Fig. 1141 . 
First, ConeMaker generates an observing cone by integrating 
along the light-cone through the various snapshots, and by 
managing the radial and transverse replications. The out- 
put is a list of galaxies with apparent properties (including 
apparent magnitudes in standard photometric bands) that 
are computed from the Archives. Second, FieldMaker builds 
mock images from the observing cone, by projecting the cone 
galaxies onto the plane of the sky. Third, any instrument 
simulator can be used to transform these pre-observation im- 
ages into post-observation images, or the rest-frame spectra 
of the Archives into post-observation observer-frame spec- 
tra. We have shown in the previous section how SkyMaker 
can be used in this way, because FieldMaker is able to gen- 
erate the relevant information in the proper format. Clearly, 
many different cones can be generated within a single GAL- 
ICS Archive by changing the observing point, the direction 
of the line-of-sight and/or the aperture. Many pre-observing 
images can be generated from a single cone by changing the 
filter response curves. And many post-observing images can 
be generated from a single pre-observing image by changing 



the instrument simulator. The information produced by a 
given MOMAF post-processing constitutes what we hereafter 
call the Products of this post-processing. 

At this stage, from any single simulation, we have gen- 
erated a set of Archives and Products that includes tables 
of halo and galaxy properties, and FITS files of spectra and 
images. Two big issues obviously appear. First, the size of 
the database makes it very cumbersome. As an example, 
for a standard GALICS post-processing of our ACDM simu- 
lation, the total numbers of halos and galaxies generated in 
the 70 snapshots respectively amount to 1.5 and 1.8 million. 
The output Archives are about 4.5 GB for tables and 45 GB 
for spectra FITS files, not to speak of the Products. Second, 
the specific information that is relevant for a given user is 
hidden within the bulk of non-relevant information. Let's 
imagine for instance that we want to get the B-band abso- 
lute magnitude and total cold gas mass of a random sample 
of 100 galaxies brighter than apparent magnitude Iab = 20. 
Extracting this information will require reading tables with 
many columns (the properties) and many rows (the galax- 
ies). It may be possible to anticipate the latter issue, and 
generate many specific tables for many different situations 
and potential users, but the same pieces of information will 
consequently be duplicated many times, which is not the 
proper way to proceed. 

The solution to this conundrum is well known: it con- 
sists in storing the information into a relational database 
(hereafter RDB). Here, we use the word database (with its 
loose meaning) for all the information we want to make avail- 
able, and the word relational database (with its strict mean- 
ing) for the technical way of putting part of this information 
into a specific structure. 

We decided to use MySQL as the relational database 
server. MySQL is a freely available, widely used and ex- 
tremely fast database server which is capable enough for 
our purposes. Tools to provide Web-based access to the 
MySQL server are also available. The tables generated by 
the GALICS and MOMAF post-processing are stored in MySQL 
tables. Our database input and testing is done using sev- 
eral short scripts in Perl that use the Perl/DBI module. 
The Web front-end uses PHP4 to pass SQL queries to the 
MySQL database. Query outputs can either be displayed as 
an HTML table within the browser or down-loaded to a lo- 



16 Blaizot et al. 



cal file. In this section we briefly describe the database. A 
quick-start guide, sample queries and descriptions of the var- 
ious fields in each table are available at the GALICS web-site 
(http : //galics .iap.fr). 

From a single dark matter simulation, each choice in the 
list of the input parameters corresponds to a GALICS post- 
processing with its specific Archives and Products, which in 
its turn corresponds to a single MySQL database. The in- 
formation is stored into a structure designed after the usual 
analysis in terms of entities, attributes and relationships, 
that is designed to minimise storage space and maximise 
query speed. Each MySQL database is consequently organ- 
ised in three MySQL tables for the Archives, respectively 
called the box, halo, and galaxy tables, and numerous cone 
tables for the Products. The database scheme is illustrated 
in Figure [Tol 

(i) The box table contains general information about 
mean quantities at each snapshot of the simulation, such 
as the cosmic time, corresponding redshift, total number of 
halos and galaxies within the box, and integrated cosmic 
quantities such as the cosmic star formation rate, cold gas 
content, hot gas content, etc. 

(ii) The halo table contains information on the halos at 
each time-step. Each halo is identified by a unique ID in the 
simulation. This information deals with dark matter (e.g. 
mass of the halo, virial radius, circular velocity) as well as 
the baryonic content of the halos (e.g. mass of hot gas and its 
metalicity). On top of this, we include spatial information, 
namely positions and velocities of the centres of mass of the 
halos, and hierarchical information, that is, merging history 
links. This information is in principle enough for one to run 
one's own semi-analytic model on our dark matter simula- 
tions, and thus freely test new recipes and compare results 
with GALICS. 

(iii) The galaxy table contains the physical information 
we compute for galaxies: stellar masses, star formation rates, 
gas contents, rest-frame absolute magnitudes in a variety of 
filters, etc. Each galaxy is identified by a unique ID in the 
simulation. 

(iv) The cone tables contain the positions of galaxies dis- 
tributed in a mock catalogue, along with their apparent 
magnitudes in a variety of filters. Each galaxy is identified 
by a unique ID in the cone. However, because of transverse 
replication, different cone galaxy ID's can point to the same 
galaxy ID in the simulation. There are several cone tables, 
corresponding to different random seeds for the box shuffling 
process (which mimics, to some extent, cosmic variance), or 
to different field sizes. 

Of course, the information included in the four tables 
is usable simultaneously in the queries, since the ID's of ha- 
los and galaxies are shared by the tables of the database 
and allow one to pass information from one table to an- 
other. Companion information on mock spectra and images 
is stored as FITS files. The rest-frame spectra are related to 
the galaxies in the galaxy table, whereas the images are re- 
lated to a particular cone table. The observer-frame spectra 
are related both to galaxies and to the cone from which the 
galaxies are identified. 

The GALICS web-site also contains a hierarchical query 
page which allows the user to retrieve hierarchical informa- 
tion for any galaxy in mock catalogues or snapshots. At the 
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Figure 15. Data model of the GALICS/momaf database. There 
are four layers of information corresponding to four tables. The 
box table contains information on mean quantities computed for 
each time-step, e.g. redshift, cosmic time, mean SFR. The halo 
table contains information on the DM halos of each time-step, 
including their dynamical properties and merging history. The 
galaxy table contains information about the physical properties 
of galaxies at each time-step, including rest-frame magnitudes and 
spectra. The cone table contains information on the mock observ- 
ing cone, mainly positions and apparent magnitudes. Companion 
FITS files include rest-frame spectra, pre- or post-observing im- 
ages, and observer-frame spectra. 



moment, this page contains 3 scripts. The first one allows 
one to view the full merging tree of a given galaxy, identi- 
fied by its unique ID. The second one allow one to follow the 
evolution of selected properties of a galaxy along its merging 
history tree. Here, three options are available : (i) one can 
follow properties along the main branch which links a galaxy 
to its most massive progenitor at each merger; (ii) one can 
follow the most massive branch, which links the most mas- 
sive progenitors at each timestep together; or (iii) one can 
perform a sum of additive properties on all progenitors at 
each timestep. Each of these options is a different way to re- 
trieve partial information contained in the full merging his- 
tory tree of a galaxy. The third script allows one to get the 
list of descendents or progenitors (at any redshift) of a sam- 
ple of galaxies selected with any set of criteri a. An example 
of use of this powerful script can be found in lBlaizot et alJ 
( 2004). These scripts allow, for the first time, the systematic 
exploration of the evolution of galaxies in the framework of 
hierarchical galaxy formation. 



6 CONCLUSIONS 

In this paper, we presented the Mock Map Facility that takes 
the results of our GALICS hybrid model of hierarchical galaxy 
formation to make mock galaxy samples. Our method in- 
volves the construction of observing cones by integrating 
through the snapshots of the iV-body simulation, and by 
using the properties of galaxies as they are computed by 
the GALICS post-processing. This technique builds up on the 
simulation and is affected by the shortcomings of the latter 
(mass resolution, and absence of rare objects due to the lim- 
ited size of the box). It also incorporates shortcomings due 
to radial replication along the line-of-sight, and, for large 
solid angles, transverse replication. We introduced box re- 
shuffling to minimise replication effects. The price of this 
technique is the loss of some signal for the correlation func- 
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tions (both 2D and 3D) on distances smaller than the size 
of the box. This loss is generally not larger than 10%. Of 
course, there is no signal on distances larger than the size 
of the box, and finite volume effects have been shown to in- 
troduce a significant (but well understood) bias on angular 
correlation functions. 

For the purpose of analysing the limitations of our 
method, we compared predictions of GALICS/momaf to vari- 
ous observations. We showed that the model agrees well with 
K-band counts and redshift distributions. And we showed 
that within finite volume effects, the model also agrees 
well with the APM angular correlation function. These re- 
sults have been ob tained with the same model as used in 
iBlaizot et alJ (I2004T) which showed good agreement with the 
properties of Lyman break galaxies at z ~ 3. This shows 
that our mock catalogues can readily be used for a variety 
of scientific investigations. 

From the mock catalogues of the observing cones, we 
show how to make "realistic" mock images. Since our GAL- 
ICS post-processing involves multi-wavelength information 
from the UV to the sub-millimetre range, our mock im- 
ages are produced through a wide range of standard filters. 
These field images can be observed through any instrument 
simulator. The technique is able to produce input lists for 
SkyMaker. Instrument simulators adapted to observations at 
infrared and sub-millimetre wavelengths can also be used. 

The database produced by the GALICS and MOMAF post- 
processing is quite large, and has to be stored in such a way 
that easy access to relevant information is provided. We put 
the results into a relational database structure to which SQL 
queries can be passed through a simple Web interface. This 
structure has a number of well-known advantages: it opti- 
mises storage space, it makes access to the relevant informa- 
tion very easy, it is able to deal with simultaneous queries 
and updates, etc. The results of GALICS (physical proper- 
ties, rest-frame magnitudes) and MOMAF (observable prop- 
erties, apparent magnitudes) are stored in this database, and 
linked together through the standard system of a relational 
database model. FITS files of mock images and spectra are 
also available linked from the database. 

The content of the database can be used for several 
purposes. For instance: 

• comparison of mock predictions with observations 
through the production of a mock survey that can be pro- 
cessed with the same data processing pipeline as the actual 
survey; 

• elaboration of observing strategies for forthcoming 
satellite missions and ground-based instruments; 

• benchmark for data processing pipelines; A database 
populated with GALICS sources is a valuable "test set" on 
which to base and test the various techniques and algorithms 
for data reduction and analysis, for the next generation of 
astronomical instrumentation; the database includes the po- 
sitions and magnitudes of the galaxies that are put into the 
mock images, and can be used as a "truth table" that has 
to be recovered by the data processing software; 

• creation of customised galaxy samples for comparison 
with other models, or observational data. 

We are considering improvements to this prototype 
database. They can develop along three axes: (i) In the mid- 
term future, the foreseen computer performances make repli- 



cations unavoidable if a sufficient level of mass resolution for 
galaxy studies has to be attained. However, the improve- 
ment of the simulations will result in larger boxes that will 
decrease the number of radial and transverse replications, 
and be able to include rarer objects, (ii) The improvement 
of the physics within the simulations will also make better 
mass resolution possible, and will (hopefully) produce bet- 
ter results. There is no doubt also that the semi-analytic 
recipes have to be improved. The same cone building tech- 
nique will be used also for converting the outputs of A r -body 
simulations + hydrodynamics into mock observations, (iii) 
The database prototype that has been presented here will be 
enhanced to make it compatible with the data and metadata 
standards that are now being developed as part of the the- 
oretical virtual observatory. The present MOMAF will form 
a valuable test-bed for testing the integration of theoretical 
data from simulations into the theoretical virtual observa- 
tory, which forms a part of the global Astronomical Virtual 
Observatory effort. 
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APPENDIX A: USING THE DATABASE 

In this appendix, we briefly illustrate how the database can 
be used to interpret observational data in the paradigm of 
hierarchical galaxy formation. We give four examples which 
exemplify the kind of information that can be exploited: 

(i) synthesis of a volume-limited sample of galaxies, for 
instance at z ~ 0, 

(ii) synthesis of a magnitude-limited sample of galaxies, 
and related multi-wavelength information, 

(iii) search for 2D and 3D spatial information (e.g for red- 
shift distribution, clustering), and correlation of properties 
with it, 

(iv) search for merging history trees within hierarchical 
galaxy formation (e.g. in what type of galaxy does the ma- 
terial of a given high-redshift galaxy end up at z ~ 0? How 
many progenitors does a galaxy at z ~ have?). 

For each example, we give a typical SQL query that re- 
turns the requested subsample by querying the database. We 
refer the reader to the Web page (http: //galics . iap . f r/ ) 
for additional examples, and a simple introduction to SQL 
syntax. 

Al Volume—limited samples 

It is possible to query the database to list a series of physi- 
cal properties for a subsample of galaxies with sophisticated 



selection criteria. As an example, select 100 galaxies at ran- 
dom in the z = snapshot (that corresponds to timestep 
70), with the requirement that their Johnson B-band ab- 
solute magnitude is brighter than —20, and their dispersion 
velocity is larger than 200 km/s. We are also interested in 
obtaining their absolute Tf-band magnitude, B — K colour, 
morphological types and total stellar mass. 

> SELECT gal.id, type_B2D_lum, tot_J0HNS0N_B , 

> tot_speed, tot_J0HNS0N_K, 

> tot_J0HNS0N_B-tot_J0HNS0N_K, totjnstar 

> FROM galaxy 

> WHERE timestep=70 

> AND tot_J0HNS0N_B < -20 

> AND tot.speed > 200 

> ORDER BY RANDO 

> LIMIT 100 

The last two commands place the list of galaxies recovered 
by the query in random order, and then limit the output to 
the first 100 rows. This example takes less than 1 second to 
run. 

A2 Magnitude— limited samples 

Another type of query is to select galaxies according to their 
apparent magnitudes in order to mimic an observational 
sample or to predict what a forthcoming survey will yield. 
The selected sample can then be studied in the explicit cos- 
mological context of GALICS, and the physical properties of 
the selected galaxies can be retrieved easily to gain insight 
on the nature of the "observed" objects. An example of using 
mock catalogues to interpret observational data is given in 
Blai zot et alJ (|2004). A crucial issue in observational galaxy 
formation studies is the identification of counterparts at any 
wavelength of galaxies observed through any given filter. It 
is often quite a challenge, for example, to identify the opti- 
cal counterparts of far infrared sources, observed with low 
angular resolution. The GALICS database provides a power- 
ful tool to address these questions as it predicts emission 
properties of galaxies from the UV to the sub-mm and gives 
the opportunity to build corresponding mock maps. 

An example SQL query to retrieve a magnitude limited 
sample in a 1 deg 2 cone is given below, for galaxies brighter 
than Iab = 22.5. We are interested in their apparent B- 
K colour (in the AB system), their total stellar mass, and 
the virial mass of their host halo. Such a query requires 
information that is present not only in the cone table, but 
also in the galaxy table and the halo table. It requires what 
is called a join in SQL syntax: 

> SELECT cone_001 . cone_id, cone_001 . app_redshif t , 

> cone.OOl. J0HNS0N.BAB, cone.OOl . J0HNS0N_KAB , 

> cone.OOl . J0HNS0N_BAB-cone_001 . J0HNS0N.KAB , 

> galaxy . totjnstar , halo.m_vir 

> FROM galaxy, cone.OOl, halo 

> WHERE cone.OOl. J0HNS0N.IAB < 22.5 

> AND cone.OOl .gal.id = galaxy . gal.id 

> AND halo . halo_id=galaxy .halo.id 

This example query runs in about 30 seconds, and re- 
turns information (7 columns) for about 31000 galaxies. 

The same type of selection can be used to work the 
other way around: one can select galaxies according to their 
physical properties or their dark matter halo properties, and 
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extract their spatial distribution and apparent magnitudes 
from the cone. 

A3 Spatial information 

Spatial information can be retrieved the same way as above, 
for galaxies in mock catalogues. Consider the following 
query : 

> SELECT right_ascension, declination, app_redshift 

> FROM cone_001 

> WHERE JOHNSON.IAB < 22.5 

This produces a table with the angular coordinates and 
apparent redshifts of all the mock galaxies brighter than 
22.5 in the Iab band within a 1 deg 2 field. There are again 
about 31000 such galaxies, and the query here runs in about 
5 seconds. 

A4 Hierarchical evolution 

The GALICS project gives for the first time the opportunity 
to interpret observational data within the paradigm of hi- 
erarchical galaxy formation in a systematic way. One of the 
most important features of this theoretical framework is the 
notion of galaxy merging history tree. Going up or down this 
tree allows one to investigate the properties of the progen- 
itors or descendents of any given galaxy at any redshift, as 
well as the mass build-up of that galaxy. In the galaxy table, 
each galaxy has a pointer towards its unique descendant at 
the next timestep. This minimal information is sufficient to 
reconstruct any merging history tree, whether forward (the 
list of descendants, that is a single branch as time flows) or 
backwards (the list of progenitors, that may be a full tree 
with many branches as we look back). The number of merg- 
ing events for the progenitors of the galaxy whose ID is (the 
character string) xxyyyyyzzz is easily obtained through the 
following query, as well as the ID of its descendant at the 
next timestep (for all timesteps but the last one): 

> SELECT nb_merge, daughter _num FROM galaxy WHERE 
gal_id= ' xxyyyyyzzz ' 

It is necessary to run the above query recursively to 
build up the full merging history trees. On the GALICS web- 
site we provide PHP scripts that generate such recursive 
queries and pass them to the database server. Once a galaxy 
ID is supplied by the user through the Web interface, the 
ID's of its progenitors and descendants, as well as their prop- 
erties, are recovered through the "recursive query" page. An 
interesting option allows the user to obtain the sum of the 
(additive) properties of all the progenitors. In such a way, 
the evolution of the total Star Formation Rate or the to- 
tal stellar mass in all the progenitors can be easily followed. 
Examples of such recursive queries can be found in GAL- 
ICS II and GALICS ill. 



